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PREFACE 


In  this  Memorandum  the  author  indicates  how  the 
mathematical  technique  of  dynamic  programming  can  be  used 
to  handle  a  number  of  processes  that  arise  in  biology, 
engineering,  economics,  and  psychology,  and,  in  general, 
to  deal  with  a  wide  class  of  problems  that  require 
learning  and  adaptation  because  of  insufficient  informa¬ 
tion  about  the  nature  of  the  underlying  process. 


SUMMARY 


The  intensive  study  in  recent  years  of  a  variety  of 
descriptive  and  variational  processes,  such  as  those 
which  arise  in  biology,  psychology,  engineering,  and 
economics,  has  uncovered  many  problems  which  are  too 
complex  to  be  solved  by  classical  mathematical  technique 
In  order  to  describe  some  of  the  difficulties  involved, 
the  author  briefly  reviews  the  essentials  of  the 
classical  approach  for  dealing  with  processes  of  this 
sort,  in  which  there  is  insufficient  information  about 
the  state  variables.  He  then  indicates  some  of  the  ways 
in  which  dynamic  programming  and  adaptive  control  may  be 
used  to  bridge  the  gap  between  classical  and  modern 
theories.  Finally,  the  author  indicates  some  of  the 
problems  encountered  in  the  study  of  adaptive  processes 
and  suggests  some  directions  for  future  research. 
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DYNAMIC  PROGRAMMING,  LEARNING,  AND  ADAPTIVE  PROCESSES 


1 .  INTRODUCTION 

The  recent  intensive  study  of  biological,  medical, 
psychological,  engineering,  and  computer  processes  has 
uncovered  large  numbers  of  problems  which  escape  not 
only  solution  by  means  of  classical  mathematical  tech¬ 
niques,  but  even  formulation. 

In  order  to  see  what  some  of  the  difficulties  are, 
it  is  necessary  to  understand  the  essential  features  of 
the  classical  approach  to  descriptive  and  variational 
processes.  We  shall  briefly  review  the  essentials  of 
this  approach  and  then  indicate  some  of  the  ways  in  which 
dynamic  programming  furnishes  a  natural  bridge  between 
classical  and  modern  theorie.^. 

Finally  we  shall  indicate  some  of  the  major  problems 
which  are  encountered  in  the  study  of  adaptive  processes 
and  suggest  some  directions  of  research, 

2.  DETERMINISTIC  DESCRIPTIVE  PROCESSES 

Let  S  be  a  physical  system  under  examination  and 
let  us  introduce  a  set  of  variables  Xj^,X2^ .  • . 
describing  the  state  of  the  system  at  any  time  t.  The 
vector  x(t)  =  (xj^ (t) , . . ,  ,Xj^(t) )  is  called  the  state 
vector.  To  determine  the  behavior  of  the  system  over 
time,  we  further  postulate  an  equation  of  the  form 

^  -  g(x(s),  -  00  <  s  <  t). 


(2.1) 


^2- 


where  the  notation  indicates  that  the  function  g  depends 
upon  the  entire  past  history  of  the  process.  In  many 
situations,  we  can  assume  that  (2.1)  has  the  form  of  an 
ordinary  differential  equation 

(2.2)  ^  -  g(x),  x(0)  -  c; 

see  [1]  for  the  more  general  case. 

The  study  of  the  properties  of  the  system  S  has 
thus  been  reduced  to  the  study  of  the  analytic  behavior 
of  the  solutions  of  a  differential  equation,  a 
considerable  reduction  in  difficulty. 

3^ _ STOCH/vSTIC  DESCRIPTIVE  PROCESSES 

It  w  j  soon  recognized  that  this  concise  description 
of  a  physical  process  was  either  not  available  or  not 
applicable  in  a  large  number  of  significant  situations. 
Either  the  functions  g(x)  were  not  known,  or  if 
precisely  known,  of  such  complicated  form  as  to  be 
unusable  due  to  the  high  dimension  of  the  vector  x.  In 
other  cases,  the  initial  state  was  not  known. 

To  circumvent  these  difficulties,  which  at  first 
sight  appear  to  be  major  obstacles  to  progress,  random 
variables  were  introduced,  wirh  average  behavior 
replacing  unique  behavior  over  time. 

Thus,  (2.2)  might  be  replaced  by 

(3.1)  ^  -  g(x(t),r(t)),  x(0)  -  c. 
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where  c  is  a  random  variable  and  x(t)  is  a  random 
function  of  t.  In  some  cases,  as  in  quantum  mechanics, 
the  random  variables  are  not  explicit  and  the  equations 
are  of  the  type  shown  in  (2.2),  with  the  components 
representing  probabilities  or  else  functions  from  which 
probaoilities  are  generated. 

4.  DETERMINISTIC  VARIATIONAL  PROCESSES 

In  the  study  of  control  processes  in  engineering 
and  economics,  we  encounter  quite  naturally  the  problem 
of  minimizing  functionals  of  the  form 

T 

(4.1)  J(x)  -  I  g(x,x',t)dt, 

0 

where  x  is  subject  to  various  initial  and  terminal 
conditions  as  well  as  to  local  and  global  constraints. 

In  mathematical  physics,  these  questions  arise  in 
connection  with  alternative  formulations  of  the  behavior 
of  systems. 

5.  DISCUSSION 

In  pursuing  this  classical  route,  we  tacitly  assume 
detailed  knowledge  of  the  following: 

(5.1)  (a)  number  of  state  variables, 

(b)  cause  and  effect, 

(c)  values  of  state  variables,  initially  and 
throughout  the  process. 
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(d)  probability  distributions — if  random 
variables  are  present, 

(e)  criteria — i£  the  processes  are  of 
variational  type. 

How  do  we  proceed  if  this  information  is  not 
available? 

6.  LEARNING  AND  ADAPTIVE  PROCESSES 

Since  we  are  treating  new  types  of  processes  and 
problems,  it  is  reasonable  to  expect  that  we  will  intro¬ 
duce  some  new  concepts  and  some  new  analytic  tools.  The 
new  concepts  are  those  of  learning  and  adaptation,  and 
the  new  tools  are  dynamic  programming  and  adaptive 
control.  Just  as  the  boundary  between  learning  and 
adaptation  is  not  precise,  so  there  is  considerable 
overlap  between  dynamic  programming  and  adaptive  control. 

It  is  clear  that  there  is  little  to  be  done  about 
ignorance  in  the  short  run.  Hence,  we  focus  our 
attention  upon  multistage  processes  where  information  is 
obtained  at  each  stage.  The  basic  problem  is  that  of 
using  this  information  so  as  to  improve  decision  making. 

Fortunately,  a  fundamental  idea  from  the  field  of 
engineering,  namely,  feedback  control,  provides  the 
essential  clue.  A  mathematical  abstraction  of  this 
leads  to  the  theory  of  dynamic  programming  [2,3,4]. 
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With  this  mathematical  apparatus  we  can  handle  a 
number  of  processes  which  arise  in  psychology,  biology, 
medicine,  economics,  and  industry — all  fields  where 
learning,  adaptation,  and  feedback  play  primary  roles. 

The  feedback  to  mathematics  itself  is  in  the  form 
of  new  ideas  and  new  fields  in  which  to  roam. 

7.  ITERATION  AND  TRANSFORMATIONS 

Let  us  begin  at  the  classical  level  with  the  concept 
of  a  transformation.  Let  p,  a  point  in  phase  space, 
denote  the  state  of  a  system  S  and  let  T(p)  denote 
the  state  a  unit  of  time  later.  Then  the  behavior  of 
the  system  over  time  is  equivalent  to  the  study  of  the 
iterates,  Pj^^ p2J  •  •  •  j  where 

(7*1)  Pj^  =  T(p),Pp  ®  T (pj^ p^_^^  ■=  T(p^). 

8^ _ DYN^IC  PR03R.\MMING 

Let  us  now  extend  this  idea  in  the  following  way. 
Instead  of  keeping  the  transformation  fixed  over  time, 
let  us  suppose  that  we  have  a  choice  of  the  transformation 
to  be  applied  at  each  stage.  If  q  denotes  the  choice 
variable,  or  control  variable,  we  have 

(8.1)  p^  =  T(p,qj^),P2  »  T(pj^,q2),...,Pj^^j^  *=  ’^^Pn'^n^' - 

The  q^  are  to  be  chosen  so  as  to  minimize  a  given 


criterion  function 


(8.2) 


A  set  of  q^  is  called  a  policy,  and  a  set  whi'-h 
minimizes  is  called  an  optimal  policy. 

If  we  assume  that  R  has  a  separable  structure, 

(8.3)  R  =  g(p,qj^)  +  g(Pj^,q2)  +  ***> 
and  introduce  the  function 

(8.4)  f  (p)  =  min  R, 

[ql 

then  the  principle  of  optimality  [2,3,4]  yields  the 
functional  equation 

(8.5)  f(p)  =  min  [g(p,qj^)  +  f  (T(p,  q^^)  )  ]  . 

^1 

In  the  continuous  case,  the  analogue  of  (8.5)  yields 
as  a  by-product  the  Euler  equation  and  the  entire  set  of 
classical  conditions  of  the  calculus  of  variations  [5]. 

9.  ABSTRACTION  AND  EXTENSION 

Since  we  have  carefully  avoided  defining  the  phase 
space  to  which  p  belongs,  nothing  prevents  us  from 
taking  p  to  be  a  point  in  an  inf inite— dimensional  space 
or  from  choosing  as  components  of  p  probability 
distributions,  past  histories,  and  so  on. 

We  thus  have  a  quite  general  formulation  of  multi¬ 
stage  decision  processes.  It  remains  to  apply  this 
formalism  to  the  study  of  learning  and  adaptive  processes. 


10.  ADAPTIVE  PROCESSES  AND  LEARNING 

The  fundamental  tool  for  treating  ignorance  is 
probability  theory.  If  we  do  not  know  the  value  of  a 
parameter,  we  assume  that  it  is  a  random  variable  with  a 
given  probability  distribution.  If  we  do  not  know  the 
probability  distribution,  v.  take  it  to  be  a  random 
probability  distribution,  an  element  of  a  family  of 
probability  distributions.  If  we  oo  not  know  the 
family  ...  and  so  on.  In  this  way,  we  are  led  quite 
naturally  to  the  consideration  of  hierarchies  of 
uncertainties;  see  the  discussion  in  [6], 

The  generalized  state  of  a  system  S  in  an 
adaptive  process  consists  then  not  only  of  the  usual 
physical  state,  but  contains  also  the  best  current 
estimates  of  unknown  quantities.  These  estimates  may  be 
numbers,  e.g.,  expected  values  and  variances,  or  they 
may  be  probability  distributions. 

At  each  stage  of  the  decision  process  we  must  make 
a  decision,  a  choice  of  q,  and  we  must  estimate  the 
new  state  T(p,q)  on  the  basis  of  new  information.  Note 
that  in  many  cases,  part  of  the  decision  process  is  the 
determination  of  how  much  effort  is  to  be  devoted  to 
obtaining  additional  information. 

For  analytical  details,  see  [7],  [4]. 

"Learning"  can  now  be  interpreted  on  several  levels, 
consistent  with  the  concept  of  hierarchies  of  uncertainty. 


It  is  first  of  all  the  ability  to  estimate  efficiently 
at  each  stage  so  that  ultimately  the  unknown  elements 
become  known.  It  is  secondly  the  ability  to  estimate 
inabilities—to— estimate  on  the  basis  of  a  model  ot  simple 
uncertainties,  and  to  introduce  more  sophisticated 
uncertainties,  and  so  on. 

We  see  then  that  we  are  led  to  the  concept  of 
levels  of  intelligence,  an  idea  which  is  quite  important 
in  connection  with  the  construction  of  automata. 

11.  APPLICATIONS 

Let  us  note  that  these  ideas  can  be  applied  to  the 
construction  of  simulation  processes,  both  in  the  business 
area  [8]  and  in  the  field  of  psychiatry  [9],  They  afford 
a  simple  and  flexible  framework  for  the  study  of  many 
multistage  processes  and  have  many  immediate  uses  in 
modem  control  theory  [4]. 

12.  COMPLEXITY 

As  far  as  obtaining  numerical  answers  to  numerical 
questions  is  concerned,  we  are  nowhere  near  a  satisfactory 
situation.  If  the  dimension  of  p  is  small,  we  have 
efficient  routine  techniques  using  digital  computers;  if 
the  dimension  is  large,  e.g.,  10,  or  if  p  has  components 
which  are  functions,  these  methods  fail.  Although  a 
number  of  approximate  methods  exist  which  enable  us  to 
treat  many  additional  classes  of  problems — e.g., 
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polynomial  approximation,  stochastic  approximation— we 
have  not  really  come  to  grips  with  complexity. 

In  particular,  we  have  no  idea  at  the  present  of  how 
the  human  mind  handles  situations  involving  huge  masses 
of  data,  conflicting  information,  and  imprecise  criteria, 
and  then  makes  a  decision. 

It  seems  quite  clear  that  when  we  someday  understand 
the  neurophysiological  basis  of  the  human  memory,  or 
memories,  and  the  liuman  data—retrieval  system,  then  we 
shall  make  progress  in  other  areas.  Furthermore,  when 
we  agree  to  emancipate  ourselves  from  the  restriction  of 
universally  true  theorems  and  theories  and  study  approxi¬ 
mations  in  logical  space,  then  we  shall  develop  powerful 
approximation  methods  in  science. 
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